Goto

Collaborating Authors

 local minimum point


On Convergence Lemma and Convergence Stability for Piecewise Analytic Functions

arXiv.org Artificial Intelligence

In this work, a convergence lemma for function $f$ being finite compositions of analytic mappings and the maximum operator is proved. The lemma shows that the set of $\delta$-stationary points near an isolated local minimum point $x^*$ is shrinking to $x^*$ as $\delta\to 0$. It is a natural extension of the version for strongly convex $C^1$ functions. However, the correctness of the lemma is subtle. Analytic mappings are necessary for the lemma in the sense that replacing it with differentiable or $C^\infty$ mappings makes the lemma false. The proof is based on stratification theorems of semi-analytic sets by {\L}ojasiewicz. An extension of this proof presents a geometric characterization of the set of stationary points of $f$. Finally, a notion of stability on stationary points, called convergence stability, is proposed. It asks, under small numerical errors, whether a reasonable convergent optimization method started near a stationary point should eventually converge to the same stationary point. The concept of convergence stability becomes nontrivial qualitatively only when the objective function is both nonsmooth and nonconvex. Via the convergence lemma, an intuitive equivalent condition for convergence stability of $f$ is proved. These results together provide a new geometric perspective to study the problem of "where-to-converge" in nonsmooth nonconvex optimization.


Active learning for enumerating local minima based on Gaussian process derivatives

arXiv.org Machine Learning

When the evaluation of a blackbox function is expensive, it is often difficult to exhaustively investigate the function in the entire input domain. Active learning (AL) has been developed as a method for effectively selecting the input points at which the function evaluations are helpful for the target task. For example, if the target task is to find the global minimum, it is reasonable to evaluate the function at the input points which are likely to be global minima (this AL problem has been intensively studied in the context of Bayesian Optimization (BO) [9, 2, 1, 6, 14, 5]). In this paper, we study the problem of enumerating local minima (or maxima) of a black-box function. In many applications, it is beneficial to identify the positions of local minima and/or maxima because it helps to roughly grasp the "shape" of the black-box function.


Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent

arXiv.org Machine Learning

We interpret the variational inference of the Stochastic Gradient Descent (SGD) as minimizing a new potential function named the \textit{quasi-potential}. We analytically construct the quasi-potential function in the case when the loss function is convex and admits only one global minimum point. We show in this case that the quasi-potential function is related to the noise covariance structure of SGD via a partial differential equation of Hamilton-Jacobi type. This relation helps us to show that anisotropic noise leads to faster escape than isotropic noise. We then consider the dynamics of SGD in the case when the loss function is non-convex and admits several different local minima. In this case, we demonstrate an example that shows how the noise covariance structure plays a role in "implicit regularization", a phenomenon in which SGD favors some particular local minimum points. This is done through the relation between the noise covariance structure and the quasi-potential function. Our analysis is based on Large Deviations Theory (LDT), and they are validated by numerical experiments.